Goto

Collaborating Authors

 christopher columbus




Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

arXiv.org Artificial Intelligence

We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a trade-off between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the likelihood of something being true, even as they produce falsehoods on the surface.


ChatGPT: Optimizing Language Models for Dialogue

#artificialintelligence

We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. We are excited to introduce ChatGPT to get users' feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free.


While everyone waits for GPT-4, OpenAI is still fixing its predecessor

MIT Technology Review

ChatGPT appears to address some of these problems, but it is far from a full fix--as I found when I got to try it out. This suggests that GPT-4 won't be either. In particular, ChatGPT--like Galactica, Meta's large language model for science, which the company took offline earlier this month after just three days--still makes stuff up. There's a lot more to do, says John Shulman, a scientist at OpenAI: "We've made some progress on that problem, but it's far from solved." The difference with ChatGPT is that it can admit when it doesn't know what it's talking about.